Hierarchical Classification of Web Pages Using Support Vector Machine

نویسندگان

  • Yi Wang
  • Zhiguo Gong
چکیده

In this paper, a novel method for web page hierarchical classification is addressed. In our approach, SVM is used as the basic algorithm to separate any two sub-categories under the same parent node. In order to alleviate the ill shift of SVM classifier caused by imbalanced training data, we try to combine the original SVM classifier with BEV algorithm to create classifier called VOTEM. Then, a web document is assigned to a sub-category based on voting from all category-tocategory classifiers. This hierarchical classification algorithm starts its work from the top of the hierarchical tree downward recursively until it triggers a stop condition or reaches the leaf nodes. And our experiment reveals that proposed algorithm obtains better results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration

Most Web page classification models typically apply the bag of words (BOW) model to represent the feature space. The original BOW representation, however, is unable to recognize semantic relationships between terms. One possible solution is to apply the topic model approach based on the Latent Dirichlet Allocation algorithm to cluster the term features into a set of latent topics. Terms assigne...

متن کامل

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

Spectral-spatial classification of hyperspectral images by combining hierarchical and marker-based Minimum Spanning Forest algorithms

Many researches have demonstrated that the spatial information can play an important role in the classification of hyperspectral imagery. This study proposes a modified spectral–spatial classification approach for improving the spectral–spatial classification of hyperspectral images. In the proposed method ten spatial/texture features, using mean, standard deviation, contrast, homogeneity, corr...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Classifying Malicious Web Pages by Using an Adaptive Support Vector Machine

In order to classify a web page as being benign or malicious, we designed 14 basic and 16 extended features. The basic features that we implemented were selected to represent the essential characteristics of a web page. The system heuristically combines two basic features into one extended feature in order to effectively distinguish benign and malicious pages. The support vector machine can be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008